Wine Quality Analysis¶
Created 17-Oct-2024 Mark A. Goforth, Ph.D.¶
Purpose¶
This notebook is designed for EDA and train a DNN model to perform a quality estimation of wine by it's chemical composition.
Goal¶
Challenges & Discussion¶
General Steps for Approach¶
Download data
- wine quality data is downloaded from kaggle
EDA
- identify independent variables that influence the outcome
Feature Engineering
- normalize and standardize independent variables as necessary
- reduce dimensionality
Train/Test Split
- split data for training and final testing to see how performance will be in the real world
- use random shuffle and stratified split to preserve proportions of classes
Model Selection, Cross Validation, and Tuning
- use K-fold cross validation to reduce bias, build more generalized model and prevent overfitting
- apply hyperparameter tuning to search for best settings that provide improved bias and variance
Model Validation
- run model on test set to see how model will perform on real world data
Create GAN (TBD)
- create a Generative Adversarial Network (GAN) deep learning architecture
- train two neural networks to compete against each other to generate more authentic new data from a given training dataset
Create VAE (TBD)
- create a Variational Autoencoder (VAE) deep learning architecture
- train neural network to use in anomaly detection
Conclusion¶
Install any necessary python packages¶
In [ ]:
!pip install kagglehub
In [ ]:
!pip install tensorflow
In [ ]:
!pip install keras_tuner
Import Libraries¶
In [2]:
import datetime
import time
import numpy as np
import pandas as pd
import seaborn as sns
import scipy.stats as stats
import statsmodels.api as sm
# import pylab as plt
from IPython.display import Image
from IPython.core.display import HTML
from pylab import rcParams
import sklearn
from sklearn import decomposition
from sklearn.decomposition import PCA
from sklearn import datasets
import kagglehub
import ppscore as pps
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score, KFold, GroupShuffleSplit
from sklearn import metrics
from sklearn.metrics import confusion_matrix
import pickle
import tensorflow as tf
from tensorflow.keras import datasets, layers, models, losses
from matplotlib import pyplot as plt
import keras_tuner
import keras
Download latest dataset version¶
In [3]:
pathstr = kagglehub.dataset_download("adarshde/wine-quality-dataset")
print("Path to dataset files:", pathstr)
df = pd.read_csv(pathstr+'/winequality-dataset_updated.csv')
# df = df.drop_duplicates()
Path to dataset files: /Users/Mark/.cache/kagglehub/datasets/adarshde/wine-quality-dataset/versions/3
Exploratory Data Analysis (EDA)¶
In [4]:
df.head()
Out[4]:
| fixed acidity | volatile acidity | citric acid | residual sugar | chlorides | free sulfur dioxide | total sulfur dioxide | density | pH | sulphates | alcohol | quality | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 7.3 | 0.70 | 0.00 | 1.9 | 0.076 | 11.0 | 34.0 | 0.9978 | 3.51 | 0.56 | 9.4 | 5 |
| 1 | 7.8 | 0.88 | 0.00 | 2.6 | 0.098 | 25.0 | 67.0 | 0.9968 | 3.20 | 0.68 | 9.8 | 5 |
| 2 | 7.8 | 0.76 | 0.04 | 2.3 | 0.092 | 15.0 | 54.0 | 0.9970 | 3.26 | 0.65 | 9.8 | 5 |
| 3 | 11.2 | 0.28 | 0.56 | 1.9 | 0.075 | 17.0 | 60.0 | 0.9980 | 3.16 | 0.58 | 9.8 | 6 |
| 4 | 7.2 | 0.70 | 0.00 | 1.9 | 0.076 | 11.0 | 34.0 | 0.9978 | 3.51 | 0.56 | 9.4 | 5 |
In [5]:
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1999 entries, 0 to 1998 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 fixed acidity 1999 non-null float64 1 volatile acidity 1999 non-null float64 2 citric acid 1999 non-null float64 3 residual sugar 1999 non-null float64 4 chlorides 1999 non-null float64 5 free sulfur dioxide 1999 non-null float64 6 total sulfur dioxide 1999 non-null float64 7 density 1999 non-null float64 8 pH 1999 non-null float64 9 sulphates 1999 non-null float64 10 alcohol 1999 non-null float64 11 quality 1999 non-null int64 dtypes: float64(11), int64(1) memory usage: 187.5 KB
In [6]:
df.describe().T.style.background_gradient(axis=0)
Out[6]:
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| fixed acidity | 1999.000000 | 8.670335 | 2.240023 | 4.600000 | 7.100000 | 8.000000 | 9.900000 | 15.900000 |
| volatile acidity | 1999.000000 | 0.541773 | 0.180381 | 0.120000 | 0.400000 | 0.530000 | 0.660000 | 1.580000 |
| citric acid | 1999.000000 | 0.246668 | 0.181348 | 0.000000 | 0.110000 | 0.200000 | 0.385000 | 1.000000 |
| residual sugar | 1999.000000 | 3.699090 | 3.290201 | 0.900000 | 2.000000 | 2.300000 | 3.460000 | 15.990000 |
| chlorides | 1999.000000 | 0.075858 | 0.048373 | 0.010000 | 0.056000 | 0.075000 | 0.086000 | 0.611000 |
| free sulfur dioxide | 1999.000000 | 20.191096 | 15.642224 | 1.000000 | 9.000000 | 16.000000 | 27.000000 | 72.000000 |
| total sulfur dioxide | 1999.000000 | 52.617809 | 37.051121 | 6.000000 | 24.000000 | 42.000000 | 73.000000 | 289.000000 |
| density | 1999.000000 | 0.996477 | 0.002110 | 0.990070 | 0.995265 | 0.996600 | 0.997800 | 1.003690 |
| pH | 1999.000000 | 3.290140 | 0.274297 | 2.340000 | 3.180000 | 3.300000 | 3.420000 | 4.160000 |
| sulphates | 1999.000000 | 0.949465 | 0.780523 | 0.330000 | 0.560000 | 0.650000 | 0.840000 | 3.990000 |
| alcohol | 1999.000000 | 10.671161 | 1.369932 | 8.400000 | 9.500000 | 10.400000 | 11.400000 | 15.000000 |
| quality | 1999.000000 | 5.637819 | 1.255574 | 2.000000 | 5.000000 | 6.000000 | 6.000000 | 9.000000 |
Attribute Information¶
| Feature | Explain |
|---|---|
| fixed acidity | most acids involved with wine or fixed or nonvolatile |
| volatile acidity | the amount of acetic acid in wine |
| citric acid | the amount of citric acid in wine |
| residual sugar | the amount of sugar remaining after fermentation stops |
| chlorides | the amount of salt in the wine |
| free sulfur dioxide | the amount of free sulfur dioxide in the wine(those available to react and thus exhibit both germicidal and antioxidant properties) |
| total sulfur dioxide | amount of free and bound forms of SO2 |
| density | the measurement of how tightly a material is packed together |
| PH | describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 |
| Alcohol | the percent alcohol content of the wine |
| quality | output variable (based on sensory data, score between 3 and 8) |
check for missing values¶
In [7]:
df.isna().sum()
Out[7]:
fixed acidity 0 volatile acidity 0 citric acid 0 residual sugar 0 chlorides 0 free sulfur dioxide 0 total sulfur dioxide 0 density 0 pH 0 sulphates 0 alcohol 0 quality 0 dtype: int64
Visualization - create histograms for each independent variable¶
In [8]:
for i in df.columns:
plt.figure(figsize=(6, 4))
sns.histplot(data=df[i])
plt.title(f'{i}')
plt.tight_layout()
plt.show()
Visualization - create box plots¶
In [9]:
columns = list(df.columns)
fig, ax = plt.subplots(11, 2, figsize=(15, 45))
plt.subplots_adjust(hspace = 0.5)
for i in range(11) :
# ax 1
sns.boxplot(x=columns[i], data=df, ax=ax[i, 0])
# ax 2
sns.scatterplot(x=columns[i], y='quality', data=df, hue='quality', ax=ax[i, 1])
compare each dependent variable with quality using box plots¶
In [10]:
for i in df.columns:
if i != 'quality':
plt.figure(figsize=(6, 4)) # Set figure size for each plot
sns.boxplot(data=df, x='quality', y= i)
plt.title(f'Box plot for quality and {i}')
plt.tight_layout()
plt.show()
In [11]:
for i in df.columns:
if i != 'quality':
plt.figure(figsize=(6, 4))
sns.violinplot(data=df, x='quality', y=i)
plt.title(f'Violin plot for {i} by Quality')
plt.tight_layout()
plt.show()
Correlate each dependent variable with quality¶
In [12]:
%matplotlib inline
rcParams['figure.figsize'] = 12, 10
sns.set_style('whitegrid')
In [13]:
# Plotting the correlation heatmap
dataplot = sns.heatmap(df.corr(), cmap="YlGnBu", annot=True, annot_kws={"size": 12})
# Displaying heatmap
plt.show()
In [14]:
rcParams['figure.figsize'] = 15, 15
sns.pairplot(df, hue='quality', corner = True, palette='Blues')
Out[14]:
<seaborn.axisgrid.PairGrid at 0x16b3772f0>
In [15]:
# Plot the top N components
dfc = df.corr().iloc[:-1,-1:].sort_values(by='quality', ascending=True)
# dfc = df.corr().iloc[-1:,:-1].sort_values(by='quality', ascending=False).transpose()
# dfc = dfc.set_index('Source').rename_axis(None)
# dfc = df.corr().iloc[:-1,-1:].sort_values(by='quality', ascending=False).transpose()
# type(dfc)
# dfc.loc['quality'].plot(kind='bar', figsize=(10,4) )
dfc.plot.barh(figsize=(10,4) )
Out[15]:
<Axes: >
Prepare data for machine learning training¶
In [16]:
X = df.drop('quality', axis=1)
variable_names = X.columns
In [17]:
variable_names
Out[17]:
Index(['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density',
'pH', 'sulphates', 'alcohol'],
dtype='object')
In [18]:
X.head()
Out[18]:
| fixed acidity | volatile acidity | citric acid | residual sugar | chlorides | free sulfur dioxide | total sulfur dioxide | density | pH | sulphates | alcohol | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 7.3 | 0.70 | 0.00 | 1.9 | 0.076 | 11.0 | 34.0 | 0.9978 | 3.51 | 0.56 | 9.4 |
| 1 | 7.8 | 0.88 | 0.00 | 2.6 | 0.098 | 25.0 | 67.0 | 0.9968 | 3.20 | 0.68 | 9.8 |
| 2 | 7.8 | 0.76 | 0.04 | 2.3 | 0.092 | 15.0 | 54.0 | 0.9970 | 3.26 | 0.65 | 9.8 |
| 3 | 11.2 | 0.28 | 0.56 | 1.9 | 0.075 | 17.0 | 60.0 | 0.9980 | 3.16 | 0.58 | 9.8 |
| 4 | 7.2 | 0.70 | 0.00 | 1.9 | 0.076 | 11.0 | 34.0 | 0.9978 | 3.51 | 0.56 | 9.4 |
In [19]:
pca = decomposition.PCA()
wine_pca = pca.fit_transform(X)
explained_variance = pca.explained_variance_ratio_
In [20]:
comps = pd.DataFrame(pca.components_, columns=variable_names)
comps
Out[20]:
| fixed acidity | volatile acidity | citric acid | residual sugar | chlorides | free sulfur dioxide | total sulfur dioxide | density | pH | sulphates | alcohol | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.001846 | 0.000523 | -0.000357 | 0.028026 | -0.000190 | 0.247027 | 0.968584 | -0.000004 | -0.000489 | 0.005685 | 0.001142 |
| 1 | 0.015116 | 0.000233 | -0.002384 | 0.080831 | -0.000815 | 0.964725 | -0.248540 | -0.000020 | -0.001492 | 0.017462 | 0.021411 |
| 2 | 0.339973 | 0.000540 | 0.002002 | 0.922064 | -0.003746 | -0.088645 | -0.005505 | 0.000027 | -0.013114 | 0.108311 | 0.120032 |
| 3 | -0.937027 | 0.011934 | -0.036213 | 0.338959 | -0.000910 | -0.015761 | -0.003945 | -0.000321 | 0.029162 | -0.023276 | 0.063061 |
| 4 | 0.011237 | -0.012509 | 0.005594 | -0.144864 | -0.007441 | -0.010352 | 0.005077 | -0.000565 | 0.002478 | 0.099027 | 0.984225 |
| 5 | -0.059325 | 0.021506 | -0.046060 | -0.080693 | -0.005740 | -0.008183 | -0.001167 | -0.000347 | -0.012838 | 0.987419 | -0.110104 |
| 6 | 0.039158 | 0.135308 | -0.195042 | -0.000585 | -0.023180 | 0.000408 | 0.000164 | -0.000339 | 0.970346 | 0.002679 | -0.000589 |
| 7 | 0.024627 | 0.774559 | -0.587061 | -0.008610 | -0.026442 | -0.001231 | 0.000005 | -0.001092 | -0.227505 | -0.044762 | 0.016495 |
| 8 | -0.019086 | 0.614831 | 0.774360 | 0.003550 | 0.126313 | 0.001359 | -0.000541 | 0.003964 | 0.073640 | 0.023861 | 0.002542 |
| 9 | 0.004182 | -0.054479 | -0.119156 | 0.001617 | 0.991306 | 0.000169 | 0.000061 | 0.003609 | 0.007158 | 0.002694 | 0.007384 |
| 10 | 0.000223 | 0.001345 | 0.003371 | 0.000054 | 0.004122 | -0.000002 | -0.000002 | -0.999985 | 0.000231 | -0.000237 | -0.000516 |
In [21]:
rcParams['figure.figsize'] = 10, 10
sns.heatmap(comps, cmap='Blues', annot=True )
Out[21]:
<Axes: >
In [22]:
# Plot the top N components
maxcol = np.argmax(pca.components_, axis=1)
n_components = 5 # Number of top components to display
rcParams['figure.figsize'] = 10, 4
plt.bar(range(0, n_components ), explained_variance[:n_components])
plt.xlabel('Principal Component')
plt.ylabel('Explained Variance Ratio')
plt.title('Top {} Principal Components'.format(n_components))
plt.xticks(np.arange(5), variable_names[maxcol[0:5]])
plt.show()
In [23]:
ppscore_list = [pps.score(df, colName, 'quality') for colName in variable_names]
df_pp_score = pd.DataFrame(ppscore_list).sort_values('ppscore', ascending=False)
df_pp_score
Out[23]:
| x | y | ppscore | case | is_valid_score | metric | baseline_score | model_score | model | |
|---|---|---|---|---|---|---|---|---|---|
| 10 | alcohol | quality | 0.024893 | regression | True | mean absolute error | 0.925463 | 0.902425 | DecisionTreeRegressor() |
| 1 | volatile acidity | quality | 0.002654 | regression | True | mean absolute error | 0.925463 | 0.923007 | DecisionTreeRegressor() |
| 2 | citric acid | quality | 0.001189 | regression | True | mean absolute error | 0.925463 | 0.924362 | DecisionTreeRegressor() |
| 0 | fixed acidity | quality | 0.000000 | regression | True | mean absolute error | 0.925463 | 0.942051 | DecisionTreeRegressor() |
| 3 | residual sugar | quality | 0.000000 | regression | True | mean absolute error | 0.925463 | 1.055569 | DecisionTreeRegressor() |
| 4 | chlorides | quality | 0.000000 | regression | True | mean absolute error | 0.925463 | 0.961211 | DecisionTreeRegressor() |
| 5 | free sulfur dioxide | quality | 0.000000 | regression | True | mean absolute error | 0.925463 | 0.976857 | DecisionTreeRegressor() |
| 6 | total sulfur dioxide | quality | 0.000000 | regression | True | mean absolute error | 0.925463 | 0.970894 | DecisionTreeRegressor() |
| 7 | density | quality | 0.000000 | regression | True | mean absolute error | 0.925463 | 1.081802 | DecisionTreeRegressor() |
| 8 | pH | quality | 0.000000 | regression | True | mean absolute error | 0.925463 | 0.988989 | DecisionTreeRegressor() |
| 9 | sulphates | quality | 0.000000 | regression | True | mean absolute error | 0.925463 | 0.993949 | DecisionTreeRegressor() |
normalize data¶
In [24]:
# Create X from DataFrame and y as Target
X_temp = df.drop(columns='quality')
y = df.quality
In [25]:
scaler = MinMaxScaler(feature_range=(0, 1)).fit_transform(X_temp)
X = pd.DataFrame(scaler, columns=X_temp.columns)
X.describe().T.style.background_gradient(axis=0, cmap='Blues')
Out[25]:
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| fixed acidity | 1999.000000 | 0.360207 | 0.198232 | 0.000000 | 0.221239 | 0.300885 | 0.469027 | 1.000000 |
| volatile acidity | 1999.000000 | 0.288886 | 0.123548 | 0.000000 | 0.191781 | 0.280822 | 0.369863 | 1.000000 |
| citric acid | 1999.000000 | 0.246668 | 0.181348 | 0.000000 | 0.110000 | 0.200000 | 0.385000 | 1.000000 |
| residual sugar | 1999.000000 | 0.185493 | 0.218038 | 0.000000 | 0.072896 | 0.092777 | 0.169649 | 1.000000 |
| chlorides | 1999.000000 | 0.109581 | 0.080488 | 0.000000 | 0.076539 | 0.108153 | 0.126456 | 1.000000 |
| free sulfur dioxide | 1999.000000 | 0.270297 | 0.220313 | 0.000000 | 0.112676 | 0.211268 | 0.366197 | 1.000000 |
| total sulfur dioxide | 1999.000000 | 0.164727 | 0.130923 | 0.000000 | 0.063604 | 0.127208 | 0.236749 | 1.000000 |
| density | 1999.000000 | 0.470379 | 0.154891 | 0.000000 | 0.381424 | 0.479442 | 0.567548 | 1.000000 |
| pH | 1999.000000 | 0.522055 | 0.150713 | 0.000000 | 0.461538 | 0.527473 | 0.593407 | 1.000000 |
| sulphates | 1999.000000 | 0.169253 | 0.213258 | 0.000000 | 0.062842 | 0.087432 | 0.139344 | 1.000000 |
| alcohol | 1999.000000 | 0.344115 | 0.207566 | 0.000000 | 0.166667 | 0.303030 | 0.454545 | 1.000000 |
stratify split data evenly between classes before training¶
In [26]:
def stratified_sample(df, strata_col, n_rows):
"""
Creates a stratified sample of a pandas dataframe with equal proportions for each stratum.
"""
# Get the unique values in the strata column
strata_vals = df[strata_col].unique()
# Calculate the sample size for each stratum
sample_size = int(np.ceil(n_rows / len(strata_vals)))
# Sample an equal number of rows from each stratum
samples = []
for val in strata_vals:
stratum = df[df[strata_col] == val]
sample = stratum.sample(sample_size, replace=True)
samples.append(sample)
# Concatenate the samples and return the result
result = pd.concat(samples)
return result.sample(n_rows, replace=True)
In [34]:
# stratified split data into train/test sets
ratio_train = 0.8
ratio_val = 0.1
ratio_test = 0.1
X_train, X_test, y_train, y_test = train_test_split(df, y, test_size=ratio_test, random_state=1, stratify=y)
X_test = X_test.drop(columns='quality')
# rebalance classes
X_train_balanced = stratified_sample( X_train, 'quality', 1500)
y_train_balanced = X_train_balanced['quality']
X_train_balanced = X_train_balanced.drop(columns='quality')
# split train/val with balanced classes
ratio_remaining = 1 - ratio_test
ratio_val_adjusted = ratio_val / ratio_remaining
X_train, X_val, y_train, y_val = train_test_split(X_train_balanced, y_train_balanced, test_size=ratio_val_adjusted, random_state=1, stratify=y_train_balanced)
In [35]:
df['quality'].value_counts()
Out[35]:
5 735 6 678 7 265 4 98 3 60 9 60 8 59 2 44 Name: quality, dtype: int64
In [36]:
y_train.value_counts()
Out[36]:
5 185 9 174 6 169 7 167 4 166 2 163 3 161 8 148 Name: quality, dtype: int64
In [37]:
y_val.value_counts()
Out[37]:
5 23 9 22 6 21 4 21 2 21 7 21 3 20 8 18 Name: quality, dtype: int64
In [38]:
y_test.value_counts()
Out[38]:
5 74 6 68 7 26 4 10 3 6 9 6 8 6 2 4 Name: quality, dtype: int64
In [ ]:
?? SKIP OLD CODE
# balanced split for train/val sets
df_train_stratified = stratified_sample( df_train, 'quality', 1000)
# stratified balanced split for train/test
X = df_train_stratified.drop(columns='split')
X = df_train_stratified.drop(columns='quality')
X = df_train_stratified.DataFrame(scaler, columns=X_temp.columns)
y = df_train_stratified['quality']
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.10, random_state=0, stratify=y )
# test split for final model validation on unseen sample of real data
X_test = df_test.drop(columns='split')
X_test = df_test.drop(columns='quality')
X_test = df_test.DataFrame(scaler, columns=X_temp.columns)
y_test = df_test['quality']
In [39]:
# Convert labels to one-hot encoding
y_traincat = tf.keras.utils.to_categorical(y_train)
y_valcat = tf.keras.utils.to_categorical(y_val)
y_testcat = tf.keras.utils.to_categorical(y_test)
In [41]:
# create the pie chart to show rebalancing
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(8, 4))
df.quality.value_counts().sort_index().plot.pie(ax=ax1)
ax1.set_title('Before Sampling')
# create the second pie chart for rebalanced
# df_train_stratified.quality.value_counts().sort_index().plot.pie(ax=ax2)
y_train_balanced.value_counts().sort_index().plot.pie(ax=ax2)
ax2.set_title('After Sampling')
# set the title and adjust the layout
fig.suptitle('Class Distribution of Training Data')
fig.tight_layout()
# show the figure
plt.show()
Train DNN¶
In [42]:
#------------------------------------------------------------------------------
# hyperparameter tuning function
#------------------------------------------------------------------------------
def build_model(hp):
print(hp)
inputs = keras.Input(shape=(11, 11, 9))
x = inputs
n_layers = hp.Int( "n_layers", 2, 24 )
nodeunits = hp.Int( 'units', 4, 32 )
dropout =hp.Float( "dropout",0,0.25)
learning_rate = hp.Float( "learning_rate", 0.00001, 10 )
# batch_size = hp.Float( "batch_size", 4, 64 )
optimizer = hp.Choice( "optimizer", ["adam", "adamax"] )
# model = keras.Sequential()
# model.add(keras.layers.Dense(
# hp.Choice('units', [8, 16, 32]),
# activation='relu'))
# model.add(keras.layers.Dense(1, activation='relu'))
# model.compile(loss='mse')
#--------------------------------------
# configure model
#--------------------------------------
# number of hidden layers
# number of neurons
# activation function (relu)
# output layer (sigmoid for binary classification; softmax for binary or multiclass)
# initialize ANN
ann = tf.keras.models.Sequential()
# add hidden layers
for i in range(n_layers):
ann.add(tf.keras.layers.Dense(nodeunits,activation="relu"))
if dropout > 0:
x = layers.Dropout(dropout)(x)
# create output layer (number of units = number of classes)
# ann.add(tf.keras.layers.Dense(units=1,activation="sigmoid"))
ann.add(tf.keras.layers.Dense(units=10,activation="softmax"))
# compile model
# optimizers:
# Adam (good), AdamW, Adadelta, Adagrad, Adamax (good), Nadam, Ftrl (bad), Lion (very noise loss), SGD (good but takes alot of epochs)
if optimizer == "adamax":
opt = tf.keras.optimizers.Adamax(learning_rate)
else:
opt = tf.keras.optimizers.Adam(learning_rate)
# loss function
# mse, binary_crossentropy, categorical_crossentropy
# metrics (accuracy)
ann.compile(optimizer=opt,loss="categorical_crossentropy",metrics=['accuracy'])
return ann
In [43]:
# run start time
print("start time: "+str(datetime.datetime.now()))
starttime = time.time()
numtrials = 100
numepochs = 50
# fitout = ann.fit( X_train, Y_train, batch_size=batchsize, validation_data=(X_test,Y_test), epochs=numepochs )
hp = keras_tuner.HyperParameters()
# hp.values["model_type"] =
hp.Float(
"learning_rate",
min_value=0.001,
max_value=0.1,
sampling="log" )
hp.Int(
"n_layers",
min_value=3,
max_value=5 )
hp.Int(
"units",
min_value=11,
max_value=11 )
hp.Float(
"dropout",
min_value=0.0,
max_value=0.05 )
hp.Int(
"batch_size",
min_value=4,
max_value=32 )
hp.Choice(
"optimizer",
["adam"] )
# hyperparameter tuning
dts = str(datetime.datetime.now().isoformat(timespec="seconds"))
dts = dts.replace(":","")
pathout = "./tuner_"+dts
print("output path: "+pathout)
# tuner = keras_tuner.RandomSearch(
tuner = keras_tuner.BayesianOptimization(
build_model,
objective='val_loss', # val_accuracy val_loss
max_trials=numtrials,
directory=pathout,
hyperparameters=hp )
tuner.search( X_train, y_traincat, epochs=numepochs, validation_data=(X_val,y_valcat))
tuner.search_space_summary()
tuner.results_summary()
print( datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S") + " runtime: " + str(round(time.time()-starttime,3)) + " seconds" )
Trial 100 Complete [00h 00m 07s]
val_loss: 1.701892614364624
Best val_loss So Far: 1.4627546072006226
Total elapsed time: 00h 11m 54s
Search space summary
Default search space size: 6
learning_rate (Float)
{'default': 0.001, 'conditions': [], 'min_value': 0.001, 'max_value': 0.1, 'step': None, 'sampling': 'log'}
n_layers (Int)
{'default': None, 'conditions': [], 'min_value': 3, 'max_value': 5, 'step': 1, 'sampling': 'linear'}
units (Int)
{'default': None, 'conditions': [], 'min_value': 11, 'max_value': 11, 'step': 1, 'sampling': 'linear'}
dropout (Float)
{'default': 0.0, 'conditions': [], 'min_value': 0.0, 'max_value': 0.05, 'step': None, 'sampling': 'linear'}
batch_size (Int)
{'default': None, 'conditions': [], 'min_value': 4, 'max_value': 32, 'step': 1, 'sampling': 'linear'}
optimizer (Choice)
{'default': 'adam', 'conditions': [], 'values': ['adam'], 'ordered': False}
Results summary
Results in ./tuner_2024-10-22T065359/untitled_project
Showing 10 best trials
Objective(name="val_loss", direction="min")
Trial 074 summary
Hyperparameters:
learning_rate: 0.0074544302703083674
n_layers: 4
units: 11
dropout: 0.04938495635729518
batch_size: 22
optimizer: adam
Score: 1.4627546072006226
Trial 028 summary
Hyperparameters:
learning_rate: 0.0037947236715404485
n_layers: 4
units: 11
dropout: 0.03210758114077595
batch_size: 4
optimizer: adam
Score: 1.4751256704330444
Trial 049 summary
Hyperparameters:
learning_rate: 0.007144535070604454
n_layers: 4
units: 11
dropout: 0.022585219692721422
batch_size: 7
optimizer: adam
Score: 1.5023205280303955
Trial 018 summary
Hyperparameters:
learning_rate: 0.003821897467082056
n_layers: 4
units: 11
dropout: 0.03211964593722686
batch_size: 4
optimizer: adam
Score: 1.5027107000350952
Trial 041 summary
Hyperparameters:
learning_rate: 0.007590405554212423
n_layers: 4
units: 11
dropout: 0.020283356076484457
batch_size: 7
optimizer: adam
Score: 1.5170587301254272
Trial 071 summary
Hyperparameters:
learning_rate: 0.008151383673141666
n_layers: 4
units: 11
dropout: 0.015563650824836961
batch_size: 6
optimizer: adam
Score: 1.5195058584213257
Trial 097 summary
Hyperparameters:
learning_rate: 0.004387410791857006
n_layers: 5
units: 11
dropout: 0.014184789986907293
batch_size: 17
optimizer: adam
Score: 1.5299832820892334
Trial 027 summary
Hyperparameters:
learning_rate: 0.003846585236459026
n_layers: 4
units: 11
dropout: 0.03209648374814431
batch_size: 5
optimizer: adam
Score: 1.549328088760376
Trial 087 summary
Hyperparameters:
learning_rate: 0.00306300122471149
n_layers: 4
units: 11
dropout: 0.02548569404969314
batch_size: 13
optimizer: adam
Score: 1.5562413930892944
Trial 026 summary
Hyperparameters:
learning_rate: 0.00352808252616922
n_layers: 4
units: 11
dropout: 0.032831012243722535
batch_size: 4
optimizer: adam
Score: 1.5564454793930054
2024-10-22 07:05:53 runtime: 714.114 seconds
In [44]:
# return the best hyperparameters
best_hp = tuner.get_best_hyperparameters()[0]
ann = tuner.hypermodel.build(best_hp)
<keras_tuner.src.engine.hyperparameters.hyperparameters.HyperParameters object at 0x17d8ab590>
In [45]:
# select the best model
best_model = tuner.get_best_models()[0]
best_model.summary()
<keras_tuner.src.engine.hyperparameters.hyperparameters.HyperParameters object at 0x17d8ab590>
/opt/anaconda3/lib/python3.12/site-packages/keras/src/saving/saving_lib.py:719: UserWarning: Skipping variable loading for optimizer 'adam', because it has 2 variables whereas the saved optimizer has 22 variables. saveable.load_own_variables(weights_store.get(inner_path))
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ dense (Dense) │ (None, 11) │ 132 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_1 (Dense) │ (None, 11) │ 132 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_2 (Dense) │ (None, 11) │ 132 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_3 (Dense) │ (None, 11) │ 132 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_4 (Dense) │ (None, 10) │ 120 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 648 (2.53 KB)
Trainable params: 648 (2.53 KB)
Non-trainable params: 0 (0.00 B)
In [46]:
numepochs = 50
# fitout = ann.fit( X_train, Y_train, batch_size=batchsize, validation_data=(X_test,Y_test), epochs=numepochs )
fitout = ann.fit( X_train, y_traincat, validation_data=(X_val,y_valcat), epochs=numepochs )
# save model
modelfilename = "ANN.keras"
ann.save(modelfilename)
# load model from file
# ann = models.load_model(modelfilename)
# print metrics
ann.summary()
Epoch 1/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.1346 - loss: 3.5397 - val_accuracy: 0.1078 - val_loss: 2.1768 Epoch 2/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.1166 - loss: 2.1740 - val_accuracy: 0.1617 - val_loss: 2.1067 Epoch 3/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.1510 - loss: 2.0800 - val_accuracy: 0.1198 - val_loss: 2.0392 Epoch 4/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.1557 - loss: 2.0303 - val_accuracy: 0.2036 - val_loss: 2.0375 Epoch 5/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.1768 - loss: 1.9928 - val_accuracy: 0.2096 - val_loss: 1.9912 Epoch 6/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.2133 - loss: 1.9534 - val_accuracy: 0.1916 - val_loss: 1.9609 Epoch 7/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.2025 - loss: 1.9302 - val_accuracy: 0.1856 - val_loss: 1.8807 Epoch 8/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.2345 - loss: 1.8579 - val_accuracy: 0.2754 - val_loss: 1.8558 Epoch 9/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.2564 - loss: 1.8188 - val_accuracy: 0.2695 - val_loss: 1.8513 Epoch 10/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.2698 - loss: 1.7948 - val_accuracy: 0.2515 - val_loss: 1.8529 Epoch 11/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.2661 - loss: 1.8533 - val_accuracy: 0.2216 - val_loss: 1.8807 Epoch 12/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.2707 - loss: 1.8209 - val_accuracy: 0.2575 - val_loss: 1.8260 Epoch 13/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.2993 - loss: 1.7979 - val_accuracy: 0.2695 - val_loss: 1.8186 Epoch 14/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.2569 - loss: 1.7935 - val_accuracy: 0.2874 - val_loss: 1.8162 Epoch 15/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.2698 - loss: 1.7849 - val_accuracy: 0.3054 - val_loss: 1.8301 Epoch 16/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.2711 - loss: 1.7684 - val_accuracy: 0.2874 - val_loss: 1.7973 Epoch 17/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.2883 - loss: 1.7241 - val_accuracy: 0.2575 - val_loss: 1.7818 Epoch 18/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.3137 - loss: 1.7333 - val_accuracy: 0.3234 - val_loss: 1.8156 Epoch 19/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.3048 - loss: 1.7542 - val_accuracy: 0.2754 - val_loss: 1.7972 Epoch 20/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.3174 - loss: 1.7066 - val_accuracy: 0.2335 - val_loss: 1.8081 Epoch 21/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.3467 - loss: 1.7086 - val_accuracy: 0.2994 - val_loss: 1.7426 Epoch 22/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.3255 - loss: 1.7223 - val_accuracy: 0.3413 - val_loss: 1.7313 Epoch 23/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.3071 - loss: 1.6955 - val_accuracy: 0.3174 - val_loss: 1.7917 Epoch 24/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.3396 - loss: 1.6738 - val_accuracy: 0.2814 - val_loss: 1.8449 Epoch 25/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.3306 - loss: 1.6859 - val_accuracy: 0.2994 - val_loss: 1.7262 Epoch 26/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.3385 - loss: 1.6727 - val_accuracy: 0.3293 - val_loss: 1.7841 Epoch 27/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.3709 - loss: 1.6238 - val_accuracy: 0.3174 - val_loss: 1.7263 Epoch 28/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.3504 - loss: 1.6484 - val_accuracy: 0.3114 - val_loss: 1.7360 Epoch 29/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.3633 - loss: 1.6535 - val_accuracy: 0.3593 - val_loss: 1.7664 Epoch 30/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.3620 - loss: 1.6523 - val_accuracy: 0.3473 - val_loss: 1.7593 Epoch 31/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.3349 - loss: 1.6692 - val_accuracy: 0.2994 - val_loss: 1.7544 Epoch 32/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.3427 - loss: 1.6209 - val_accuracy: 0.2874 - val_loss: 1.8506 Epoch 33/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.3087 - loss: 1.7013 - val_accuracy: 0.2994 - val_loss: 1.7154 Epoch 34/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.3169 - loss: 1.6478 - val_accuracy: 0.3353 - val_loss: 1.7073 Epoch 35/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.3555 - loss: 1.5861 - val_accuracy: 0.3114 - val_loss: 1.7314 Epoch 36/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.3585 - loss: 1.6098 - val_accuracy: 0.3353 - val_loss: 1.6845 Epoch 37/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.3990 - loss: 1.5710 - val_accuracy: 0.2994 - val_loss: 1.6980 Epoch 38/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.3535 - loss: 1.6165 - val_accuracy: 0.3114 - val_loss: 1.6855 Epoch 39/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.4082 - loss: 1.5475 - val_accuracy: 0.3234 - val_loss: 1.8166 Epoch 40/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.3654 - loss: 1.6093 - val_accuracy: 0.2994 - val_loss: 1.7498 Epoch 41/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.3639 - loss: 1.5571 - val_accuracy: 0.3713 - val_loss: 1.6956 Epoch 42/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.3915 - loss: 1.5427 - val_accuracy: 0.3234 - val_loss: 1.7115 Epoch 43/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.3638 - loss: 1.5470 - val_accuracy: 0.3473 - val_loss: 1.7054 Epoch 44/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.3773 - loss: 1.5609 - val_accuracy: 0.3353 - val_loss: 1.6649 Epoch 45/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.4059 - loss: 1.5469 - val_accuracy: 0.3174 - val_loss: 1.7451 Epoch 46/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.3632 - loss: 1.5654 - val_accuracy: 0.3234 - val_loss: 1.6650 Epoch 47/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.3861 - loss: 1.5364 - val_accuracy: 0.3353 - val_loss: 1.7242 Epoch 48/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.3804 - loss: 1.4801 - val_accuracy: 0.3533 - val_loss: 1.6914 Epoch 49/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.3698 - loss: 1.5516 - val_accuracy: 0.3114 - val_loss: 1.7252 Epoch 50/50 42/42 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.3806 - loss: 1.5570 - val_accuracy: 0.3593 - val_loss: 1.7052
Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ dense_4 (Dense) │ (None, 11) │ 132 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_5 (Dense) │ (None, 11) │ 132 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_6 (Dense) │ (None, 11) │ 132 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_7 (Dense) │ (None, 11) │ 132 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_8 (Dense) │ (None, 10) │ 120 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 1,946 (7.61 KB)
Trainable params: 648 (2.53 KB)
Non-trainable params: 0 (0.00 B)
Optimizer params: 1,298 (5.07 KB)
In [47]:
# accuracy metrics
history = fitout.history
acc = history['accuracy']
loss = history['loss']
val_acc = history['val_accuracy']
val_loss = history['val_loss']
print("final train accuracy: "+str(acc[-1]))
print("final train loss : "+str(loss[-1]))
print("final val accuracy: "+str(val_acc[-1]))
print("final val loss : "+str(val_loss[-1]))
print( datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S") + " runtime: " + str(round(time.time()-starttime,3)) + " seconds" )
final train accuracy: 0.3810952603816986 final train loss : 1.5434587001800537 final val accuracy: 0.359281450510025 final val loss : 1.7051748037338257 2024-10-22 07:08:47 runtime: 887.795 seconds
In [48]:
epochs_range = range(numepochs)
plt.figure(figsize=(10,5))
plt.subplot(1,2,1)
plt.plot( epochs_range, acc, label='Training Accuracy' )
plt.plot( epochs_range, val_acc, label='Validation Accuracy' )
plt.legend( loc='lower right' )
plt.ylim(0,1)
plt.title('Training and Validation Accuracy', fontsize=15 )
plt.subplot(1,2,2)
plt.plot( epochs_range, loss, label='Training Loss' )
plt.plot( epochs_range, val_loss, label='Validation Loss' )
plt.legend( loc='upper right' )
# plt.ylim(0,1)
plt.title('Training and Validation Loss', fontsize=15 )
plt.show()
Run inference on test data to validate performance¶
In [49]:
# define a function to plot confusion matrix
def plot_confusion_matrix(y_test, y_prediction):
'''Plotting Confusion Matrix'''
cm = metrics.confusion_matrix(y_test, y_prediction)
ax = plt.subplot()
ax = sns.heatmap(cm, annot=True, fmt='', cmap="Blues")
ax.set_xlabel('Prediced labels', fontsize=18)
ax.set_ylabel('True labels', fontsize=18)
ax.set_title('Confusion Matrix', fontsize=25)
ax.xaxis.set_ticklabels(['Bad', 'Good', 'Middle'])
ax.yaxis.set_ticklabels(['Bad', 'Good', 'Middle'])
plt.show()
In [50]:
# define a function to plot classification report
def clfr_plot(y_test, y_pred) :
''' Plotting Classification report'''
cr = pd.DataFrame(metrics.classification_report(y_test, y_pred_rf, digits=3,
output_dict=True)).T
cr.drop(columns='support', inplace=True)
sns.heatmap(cr, cmap='Blues', annot=True, linecolor='white', linewidths=0.5).xaxis.tick_top()
In [65]:
def clf_plot(y_test, y_pred) :
'''
1) Ploting Confusion Matrix
2) Plotting Classification Report
'''
y_predmax = np.argmax(y_pred, axis=1)
y_testmax = np.argmax(y_testcat, axis=1)
# metrics.f1_score(y_test, y_pred, average='weighted', labels=np.unique(y_pred))
# metrics.f1_score(y_test, y_pred, average='weighted',zero_division=0)
cm = metrics.confusion_matrix(y_testmax, y_predmax)
cr = pd.DataFrame(metrics.classification_report(y_testmax, y_predmax, digits=3, output_dict=True)).T
cr.drop(columns='support', inplace=True)
fig, ax = plt.subplots(1, 2, figsize=(15, 5))
# Left AX : Confusion Matrix
ax[0] = sns.heatmap(cm, annot=True, fmt='', cmap="Blues", ax=ax[0])
ax[0].set_xlabel('Prediced labels', fontsize=18)
ax[0].set_ylabel('True labels', fontsize=18)
ax[0].set_title('Confusion Matrix', fontsize=25)
# ax[0].xaxis.set_ticklabels(['Bad', 'Good', 'Middle'])
# ax[0].yaxis.set_ticklabels(['Bad', 'Good', 'Middle'])
# Right AX : Classification Report
ax[1] = sns.heatmap(cr, cmap='Blues', annot=True, linecolor='white', linewidths=0.5, ax=ax[1])
ax[1].xaxis.tick_top()
ax[1].set_title('Classification Report', fontsize=25)
plt.show()
In [72]:
# test predict (inference)
y_pred = ann.predict(X_test)
y_predmax = np.argmax(y_pred, axis=1)
y_testmax = np.argmax(y_testcat, axis=1)
cm = confusion_matrix( y_testmax, y_predmax)
print(cm)
#ann_score = round(ann.score(X_test, y_test), 3)
#print('ANN score : ', ann_score)
clf_plot(y_test, y_pred)
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 839us/step [[ 0 2 0 0 0 0 2 0] [ 1 0 1 1 0 1 2 0] [ 2 3 1 2 1 1 0 0] [ 0 4 7 30 18 12 3 0] [ 0 3 6 13 17 24 4 1] [ 0 2 0 1 2 17 4 0] [ 2 3 0 0 0 1 0 0] [ 0 5 0 0 0 0 1 0]]
In [ ]: